CHI2_CONTINGENCY

Overview

The CHI2_CONTINGENCY function performs the chi-square test of independence to determine whether there is a statistically significant association between two categorical variables in a contingency table. This test is widely used in survey analysis, A/B testing, and scientific research to assess whether observed frequencies differ meaningfully from what would be expected if the variables were independent.

The function implements Pearson’s chi-squared test, which compares observed frequencies against expected frequencies calculated from the marginal totals of the table. Expected frequencies are computed under the assumption that the row and column variables are independent. The test statistic is calculated as:

\chi^2 = \sum_{i,j} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}

where O_{ij} represents observed frequencies and E_{ij} represents expected frequencies for each cell. The degrees of freedom for an R \times C table equals (R-1)(C-1).

For 2×2 tables, Yates’ correction for continuity can be applied (enabled by default). This correction adjusts each observed value by 0.5 toward its expected value, producing a more conservative test that better approximates the chi-square distribution when sample sizes are small.

The function also supports the Cressie-Read power divergence family of statistics through the lambda_ parameter, allowing alternatives such as the log-likelihood ratio (G-test) to be computed instead of Pearson’s chi-squared statistic.

This implementation uses the scipy.stats.chi2_contingency function from SciPy. The function returns the test statistic, p-value, degrees of freedom, and the expected frequency table. A commonly cited guideline recommends that expected frequencies in each cell should be at least 5 for the chi-square approximation to be valid.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=CHI2_CONTINGENCY(observed, correction, lambda_)
  • observed (list[list], required): Contingency table of observed frequencies. Each cell must be a non-negative number. Must have at least two rows and two columns.
  • correction (bool, optional, default: true): If True and the table is 2x2, applies Yates’ correction for continuity.
  • lambda_ (str, optional, default: null): Statistic from the Cressie-Read power divergence family. Use None for Pearson’s chi-squared statistic.

Returns (list[list]): 2D list [[stat, p, dof], expected…], or error string.

Examples

Example 1: Demo case 1

Inputs:

observed correction
10 10 20 true
20 20 20

Excel formula:

=CHI2_CONTINGENCY({10,10,20;20,20,20}, TRUE)

Expected output:

Result
2.7778 0.2494 2
12 12 16
18 18 24

Example 2: Demo case 2

Inputs:

observed correction
12 3 true
17 16

Excel formula:

=CHI2_CONTINGENCY({12,3;17,16}, TRUE)

Expected output:

Result
2.4091 0.1206 1
9.0625 5.9375
19.9375 13.0625

Example 3: Demo case 3

Inputs:

observed correction
12 3 false
17 16

Excel formula:

=CHI2_CONTINGENCY({12,3;17,16}, FALSE)

Expected output:

Result
3.4988 0.0614 1
9.0625 5.9375
19.9375 13.0625

Example 4: Demo case 4

Inputs:

observed correction
10 20 true
20 15
15 25

Excel formula:

=CHI2_CONTINGENCY({10,20;20,15;15,25}, TRUE)

Expected output:

Result
4.4965 0.1056 2
12.8571 17.1429
15 20
17.1429 22.8571

Python Code

from scipy.stats import chi2_contingency as scipy_chi2_contingency

def chi2_contingency(observed, correction=True, lambda_=None):
    """
    Perform the chi-square test of independence for variables in a contingency table.

    See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2_contingency.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        observed (list[list]): Contingency table of observed frequencies. Each cell must be a non-negative number. Must have at least two rows and two columns.
        correction (bool, optional): If True and the table is 2x2, applies Yates' correction for continuity. Default is True.
        lambda_ (str, optional): Statistic from the Cressie-Read power divergence family. Use None for Pearson's chi-squared statistic. Default is None.

    Returns:
        list[list]: 2D list [[stat, p, dof], expected...], or error string.
    """
    def to2d(x):
        return [[x]] if not isinstance(x, list) else x

    observed = to2d(observed)

    # Validate observed is a 2D list
    if not isinstance(observed, list) or not all(isinstance(row, list) for row in observed):
        return "Invalid input: observed must be a 2D list."
    if len(observed) < 2:
        return "Invalid input: observed must have at least two rows."
    if not all(len(row) >= 2 for row in observed):
        return "Invalid input: observed must have at least two columns per row."
    # Check all values are non-negative numbers
    try:
        obs_arr = [[float(cell) for cell in row] for row in observed]
        if any(cell < 0 for row in obs_arr for cell in row):
            return "Invalid input: all observed frequencies must be non-negative."
    except (TypeError, ValueError):
        return "Invalid input: observed must contain only numbers."

    try:
        res = scipy_chi2_contingency(obs_arr, correction=bool(correction), lambda_=lambda_)
        stat = float(res.statistic)
        pval = float(res.pvalue)
        dof = int(res.dof)
        expected = res.expected_freq.tolist()

        # Determine max width needed (stats row has 3, expected may have more or fewer)
        stats_row = [stat, pval, dof]
        expected_cols = len(expected[0]) if expected else 0
        max_width = max(3, expected_cols)

        # Pad stats row if expected has more columns
        if expected_cols > 3:
            stats_row = stats_row + [""] * (expected_cols - 3)

        # Pad expected rows if stats has more columns (3 cols for 2-column tables)
        if expected_cols < 3:
            expected = [row + [""] * (3 - expected_cols) for row in expected]

        output = [stats_row]
        output.extend(expected)
        return output
    except Exception as e:
        return f"scipy.stats.chi2_contingency error: {e}"

Online Calculator